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Abstract 

We present efficient impiementations of a number of operations for 
quantum computers. These inciude controlled phase adjustments of the 
amplitudes in a superposition, permutations, approximations of trans- 
formations and generalizations of the phase adjustments to block ma- 
trix transformations. These operations generalize those used in proposed 
quantum search algorithms. 

1 Introduction 

Shor's factorization algorithm |) and Grover's search algorithm M, [| demon- 
strate that quantum computers can solve certain problems faster than classical 
computers. It has been well-known for over a decade that any classical algo- 
rithm has a quantum analog of comparable complexity pi pj7l pL and quantum 
analogs of classical building blocks have been studied M II], . But to ex- 
ploit the power of quantum computers and create algorithms of new complexity 
classes, we need to use building blocks that do not have classical analogs but 
instead take advantage of quantum parallelism through modifying and mixing 
amplitudes in superpositions. 

Two sorts of tools have been used effectively in the quantum algorithms 
that have been developed so far. First, transformations that mix amplitudes, 
such as the Fourier and Walsh transforms. Second, selective adjustment of the 
phases of certain states that, when combined with a mixing transform, promote 
amplitude cancellation or amplification. Such phase adjustments form the ba- 
sis of search algorithms for NP problems [jl2| [ll|. Here, we discuss efficient 
implementations of relative phase changes and of mixing transformations that 
combine amplitude from only a small number of states. The choice of phases 
and which states to mix depends on a classically efficiently computable function 
/. As we are dealing with tools for algorithms in general, specific problems will 
not be addressed, so / will remain necessarily abstract. We discuss implementa- 
tions of phase changes, of permutations, of approximations of transformations, 
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and generalizations of the phase change techniques to block matrix transfor- 
mations. For each of these transformations, we describe the resources in terms 
of time, number of calls to /, and number of additional qubits needed for the 
implementation. Our aim is simply to describe a collection of efficiently imple- 
mentable transformations which we hope will allow future designers of quantum 
algorithms to take a somewhat more high level approach when thinking about 
how to take advantage of quantum parallelism. Furthermore, the implementa- 
tions we describe for more general operations than have been used in algorithms 
proposed to date may form the basis for more effective algorithms. 

We assume that the reader is familiar with quantum computing and the 
standard terminology and notation of that field. For an introduction to the 
field, see @. 

1.1 General set-up 

Throughout we will be describing transformations of an n qubit system. In order 
to implement these transformations we will assume at times that we also have 
access to an m qubit register in which we can store values which will help us 
perform the desired transformation. We are particularly interested in describing 
transformations that can be efficiently implemented, where by "efficient" we 
mean that the implementation takes a number of steps that is polynomial in n. 

We first concern ourselves with transformations that change the relative 
phases of components that make up a superposition. Such transformations 
correspond to acting on the state with a diagonal matrix D. Conversely, because 
quantum operations are unitary, any operation described by a diagonal matrix 
will consist of such phase adjustments. Since a global phase change has no 
physical meaning, so the matrix is only well defined up to multiplication by 
a constant. To specify a general phase change would require specifying all 
N = 2™ elements D xx of the diagonal matrix D. Only phase changes that can 
be expressed in a concise form are practical. For this reason, we will assume 
that the phase changes are determined by an efficiently computable function /. 
For example, the function f(x) for Grover's search algorithm computes whether 
or not x is one of the desired elements. In Hogg's algorithms, f(x) depends 
for instance on the number of conflicts a state x has with the constraints and 
on the size of x. Here, we will take a general / that is efficiently computable 
classically. 

At first glance, the problems we are concerned with may appear trivial. 
How hard could it be to implement a diagonal matrix? However, these are 
2™ x 2™ matrices, and we are interested in implementing them in a number of 
steps which is only polynomial in n. Furthermore, there are many families of 
transformations that cannot be efficiently computed, even when they can be 
described in terms of an / that can be efficiently implemented. To illustrate 
this point we describe a permutation that can be concisely described in terms 
of /, but which cannot be efficiently implemented. 

Imagine we are in the simplest set-up for Grover's search algorithm, where 
we are looking for a single item in an unstructured database of size N = 2". 
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The efficiently computable function f(x) simply checks if x is the desired item, 
so f(x) = 1 when x is the desired item and f(x) = otherwise. One way to 
find the item would be to use a tranformation which switched the state 1 00 . . . 0) 
with the state x with f(x) — 1. If such a transformation could be efficiently 
implemented, we could find the desired item much more quickly than Grover's 
algorithm does, simply by starting with |00 ... 0), applying the transformation, 
and then reading the output, which would be the desired state. However, as 
Grover's algorithm is optimal this transformation cannot be efficiently 

implemented. 

Throughout this paper, we use the fact that efficiently implementable clas- 
sical functions can be implemented with comparable complexity on a quantum 
computer using standard building blocks. @, |l0|, || We assume perfect opera- 
tions, so we do not deal with error control. In this paper a phase change of e~ 
will be treated as one step no matter how large the m. 

Let f{x) be a classical polynomially computable function. Quantum paral- 
lelism can be used to compute all the values of f(x) for all x at the same time. 
This computation uses an additional register to hold the values of /. We will 
ignore any temporary workspace which returns to its original state by the end 
of the computation that might be needed to compute /. We use the following 
standard transform to implement the quantum parallel computation of f(x), 



Uf : \x,a) ™> \x,a®f(x)), 
where © is the bitwise exclusive-OR. 



(1) 




\a®f(x)) 



Consider a superposition of x values, 



Then Eq. (|l|) transforms J2 x a x\ x ) ® |0) as 

^2 a x\x, 0) -*■ 2J a x\x, f(x)). 



(2) 



1.2 A summary of the techniques described 

In implementing quantum algorithms it will be useful to have a variety of tech- 
niques depending on whether number of bits or coherence time (number of 
operations) is the main limiting factor. 
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This paper describes several methods for implementing relative phase changes 
to components of an rt-qubit quantum state, which can be represented as 2" x 2" 
diagonal matrices D. Specifically if the phase D xx depends on an efficiently 
computable function f(j) then 

• if there are only k distinct phase values, D can be implemented in 0(k) 
steps and two evaluations of /. The technique requires |~log 2 (fc)] additional 
qubits. 

• the well known technique for inverting the phase of states selected by 
f(x) = 1 can be extended to change the phase of selected states by a 
single phase value which is a 2 m th root of unity. This extension requires 
at most m evaluations of /, an average of less than 2 evaluations, and one 
additional qubit. 

• if all phases in D are multiples of a kth root of unity, D can be implemented 
with a single application of / using |~log 2 (fc)] additional qubits and only 
0(log 2 (fc)) operations to prepare these additional qubits. 

• if the phases in D need only be computed to k bit binary precision, D 
can be implemented in 0{k) operations using one additional qubit and k 
function evaluations. 

• if D is decomposable, in that it can be written as tensor product of single 
qubit rotations, it can be implemented trivially in 0(n) steps without 
any additional qubits or function calls. We give a sufficient and necessary 
condition for the decomposability of D. 

The utility of diagonal matrices is enhanced if it is possible to perform per- 
mutations on the quantum state efficiently. We present a technique for imple- 
menting an arbitrary permutation g on a n-bit quantum state by one evaluation 
of g and one evaluation of g~ x using n additional qubits. 

Finally, we show how some of the implementation techniques for diagonal 
matrices and permutations can be extended to block diagonal matrices, which 
effect amplitude mixing among a small number of states. 

1.3 Related Work 

In Jig Efoyer shows how to efficiently implement certain unitary transformations 
that can be represented as generalized Kronecker products. The technique ap- 
plies to general transformations along the lines of the quantum Fourier transfor- 
mation. His paper includes an efficient implementation for certain permutations 
and and an implem enta tion block diagonal matrices that is similar to the one 
described in section 5.1 , 

Knill Jl^j discusses the approximation of quantum transformations and proves 
an upper bound on the complexity of implementing arbitrary unitary transfor- 
mations. The upper bound, while smaller than previous known results, is still 
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exponential in the number of qubits. Knill also shows that arbitrary unitary 
transformations cannot be efficiently approximated. 

Tucci [ fl8| defines a "quantum compiler" based on Cosine/Sine decompo- 
sition of a given unitary matrix. In principle his approach seems promising. 
However in its present form the quantum compiler takes the actual matrix as 
input as opposed to symbolic input, so the space and time complexity just for 
the input is exponential in the number of quantum bits n. Furthermore, the 
current algorithm rarely generates polynomial implementations even when that 
is possible. 



2 Independent Phase Changes 

In this section we discuss the efficiency of implementing phase changes on com- 
ponents of an n-qubit state represented by an N x N diagonal matrix D with 
diagonal entries d x for x ranging from to N — 1, where N = 2". The meth- 
ods vary in their restrictions on the d x 's, their efficiency in terms of number of 
operations and calls to Uf needed, and the number of additional qubits required. 

In the worst case a diagonal matrix D of size N x N can be implemented in 
O(N) steps by iterating the following procedure over all N values: For any x, 
let S y (x) be the function that is 1 when y — x and otherwise. Apply Us using 
Eq. (|l|) to the original state J2 X a x \x, 0) to get ^2 X a x \x, S y (x)). Then multiply 
the state by I ® G y where 

1 

<L 



Gy 



and d y is the diagonal value of D corresponding to state y. The 5 y (x) value in 
the additional register can be removed by repeating the transform Us y ■ This 
argument shows how a general diagonal matrix can be implemented. Note that 
this implementation is not an efficient one, as it is exponential in n. As we 
describe in this paper, many special forms of the matrix can be implemented 
much more efficiently. However this implementation will be used in the sequel 
to implement k x k diagonal matrices which are part of efficient implementations 
discussed later where k is polynomial in n. 



2.1 A small number of distinct phases 

This subsection describes a method for efficiently implementing a phase change 
involving only polynomially many distinct phases r. It requires 0{r) operations, 
two calls to Uf, and [log 2 (r)] additional bits. 

Suppose there are r distinct values po, ■ ■ ■ >Pr-t of d x such that r < k = 2 rn 
for some k that is a power of 2. Further suppose f(x) is a rapidly computable 
function from n-bits to the values {0, . . . , r — 1} such that d x — Pf( x )- Let P 
be the k x k diagonal matrix with diagonal elements Po, ■ ■ ■ ,Pk-i where k — r 
elements are chosen arbitrarily. Starting with the superposition 

|0} = ^a a |x,0), 
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we first apply the transform of Eq. (|l|), with the result given in Eq. (§). We 
then operate with / ® P on this result, giving 

^2pf( x )a x \x,f(x)). (3) 

X 

Finally, the extra register for the index can be disentangled by reversing the 
computation of the index. Since bitwise exclusive-OR is its own inverse, this 
disentangling can be accomplished by redoing the U / operation, giving 

^2pf(x)a x \x,f(x)) 22p f ^a x \x,0), (4) 

X X 

which is the desired phase change. 

This algorithm requires two evaluations of the Uf . In addition to depending 
on the efficiency with which f(x) can be computed, this algorithm depends on 
the efficiency of implementations for the matrix P. As was shown above, the 
direct evaluation of a k x k diagonal matrix costs at most 0(k). Note that the 
cost to implement / ® P is the same as that to implement just P. 

Working with the k x k diagonal matrix P of the distinct phase choices, 
instead of the full N x N diagonal matrix D reduces the cost of implementing 
D. In particular, when k depends polynomially on n, the matrix D can be 
implemented in polynomial time using P, even though the size of the matrix D 
itself increases exponentially with n. 



2.2 Roots of unity 

When the desired phases are roots of unity, D can be implemented somewhat 
more efficiently. By using fewer operations than the general case given above, 
these alternate techniques will likely be somewhat less sensitive to errors, in 
addition to the advantage of faster operation. 



2.2.1 Changing the sign 

The following technique was introduced by Boyer et al Let f(x) = 1 if the 
sign of x is to change, and f(x) = otherwise. The additional register is set to 
the superposition |o) = ^=(|0) — |1)). The operation Uf of Eq. (|j) then gives a 
superposition in which the phase of those x with f(x) = 1 are inverted and \a) 
remains unchanged. This is readily seen as follows: 



^(10) -ID) 



^ a x \x,0) - ^2 a x\x, 1) + a x \x, 1) - a x \x,0) 
c£X xex xex 1 xeXi / 



= (E a ^>- E a *\ x n ®4(io>-ii» 

\xeXo x£X x / ^ 



G 



where Xq = {x\f(x) = 0} and Xi — {x\f(x) = 1}. The operation introduces 
a phase factor of —1 for exactly those x € Xi, as desired. It also leaves \a) 
unchanged. In particular the extra register is not entangled with the x values. 

This technique requires only one call to Uf, but restricts the phases to 1 and 
— 1. Otherwise it requires the same number of resources as the method described 



in section 2.1. The method described here can be generalized somewhat, to 
phases which are 2 m th roots of unity, but it cannot be generalized to arbitrary 
phase values. 

2.2.2 No direct generalization to arbitrary phase values 

Suppose we want to change the phase of all of the elements of X\ by 7. Instead 
of using \a) = -^(|0) — |1)) we use \a) — ^(|0) +7|1)). The result of applying 
U f is 



( ^2 a x \x,0) + j a xk, 1) + a ^ x ' v + 7 
* \x£X xex ieXi xex x 



a x \x,0) . (5) 



In general, the resulting state is not simply a tensor product of x and a with 
some additional phase shift. Usually, x and a become entangled. 

A possible approach to extracting the desired state from this entanglement 
is to measure the last bit. The state in Eq. (||) becomes either 

^ a^|x,0)+7 22 a x \x,0) 

xGXq x£Xi 

or 

7 Y a x\x, 1) + Y a ^ x ' ^- 

X £ Xq xGX± 

If the measurement returns 0, we have achieved the desired phase shift. To get 
the desired result when the measured value is 1, we try multiplying the state by 
7 to get 

7 2 Y ax \ x > X ) + 7 X! aa; l x ' 

x£Xq x£Xi 

We get the desired result only when -f 2 = 1. 

2.2.3 Phase changes by a 2 m th root of unity 

While the preceding calculation shows that general phase changes cannot be 
implemented with the technique for changing signs, the behavior when the last 
bit is measured does suggest a way to change the phase of the elements of Xi 
by a 2 m th root of unity. 

For example, this trick can be used to rotate part of the state by i or —i. 
Let 7 = i. Perform U / and measure the last bit. If the result is 0, the state will 
be 

^ a x \x,0)+i 2J a x \x,0) 

xGX xGX ± 
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and if the result is 1, the result will be 



i Y a x\ x i l ) + Y 0*1^, 1) = i I Y ax \ x > ^ ~ i ^2 ^l 1 ' ^ 
xex \x£X xeXi 

Except for a constant factor, the two states differ only in the phase of x G X\ 
and one can be transformed into the other by applying a phase change of — 1 
to X\. Thus half the time, when is measured, only one call to f(x) is needed. 
Otherwise a second phase change is needed, which requires an additional call to 
f(x) for a total of two calls. 

By iterating this process, one can achieve arbitrary rotations by 2 m th roots 
of unity. Let 7 = e 27 ™/ 2 . The transformation and measurement of the last bit 
give 

Y a x \x,0) + e 2 ™/ 2 '" Y, °*\ x >°) 

or 

e 2l,/r Y a *M>+ E 

when the last bit is measured to be or 1, respectively. In the latter case the 
state is, up to a constant overall phase, 

Y a x \x,l) + e~™' 2m Y a *M>- 

x£Xq 

Essentially Xi has been rotated by the right amount, but in the wrong direction. 
The desired state can be achieved by rotating X\ by e 2 ™/ 2m , twice the original 
amount, using the same process. In the worst case, rotating elements in X\ by 
e 27™/2 requires 0(m) invocations of Uf. Surprisingly, the average number of 
calls to f(x) for this rotation is only 2 2m _7 1 ■ This average is always less than 
two, so on average this technique requires fewer calls than the method given in 
section 2.1, 



2.2.4 fcth roots of unity 



A different generalization of the sign change technique of section 2.2.1 allows ad- 
ditional function calls to be avoided completely. Furthermore, multiple phases, 
even up to 2" of them, can be achieved in this way, as long as they are all mul- 
tiples of the same underlying phase u — e 27ri / k . This technique requires only 
one function call plus log 2 (fc) steps, and log 2 (fc) additional qubits. 

In this case, the bitwise exclusive-OR in Eq. (Q) is replaced by modular 
addition. Specifically, we use 

Uf : \x, a) — > \x, a + f(x) mod k). (6) 

Here, f(x) maps states to the set {0, . . . , k— 1} and the desired phase adjustment 
for state x is ui^ x \ where u = e 27r4 / fc . To perform this adjustment with a single 
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evaluation of f(x) 7 we set the extra register in the superposition 

k— 1 

R=l = Y,^~ h \ h )- (7) 

Vk h=0 

The superposition R can be constructed in log k steps using the technique de- 
scribed in section [|. 

To see the behavior of Uf of Eq. (^) acting on S ® i?, write 

where Xj is the set of states for which f(x) = j. Then 

^^TfEEE i*' fc > • ( g ) 

Operating with Eq. (^|) then gives 

TfEE E «xW fc - h |a;,/i + imodfe). (10) 

For any j, as h ranges from Otofc — 1, m = h + j mod k ranges over these 
values as well. In terms of m, h = m — j mod k and k — h = j + (k — m) mod k. 
Furthermore, since ui = 1, we can write the sum as 

TrEE E a x iv^ k - m \x,m) (11) 

or 

-i=£ £ a^»®5> fe -™ |m>, (12) 



which is just DS <g> i?. 



3 Approximation of Phase Changes 

An arbitrary phase can be approximated by a series of shifts by roots of unity. 
For instance, consider 4> = e p27rl for < p < 1. Let p — O.&1&2 . . . be the 
binary expansion of p to the desired precision. Then 

<f> = exp [ 2ttz ^ 6^-2^ ] = TJ e 27 ™ 2 " (13) 

where B = {j\bj = 1}. 
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Knill |l7| shows that arbitrary unitary transformations cannot be efficiently 
approximated. However, if the phase changes can be concisely described, then 
they can be approximated to k bit precision using Eq. (|l3|). Let the phase 
change be represented by a diagonal matrix D with phases D mm = p m , and 
let fj for each j < k be such that fj(m) is the j-th bit of p m . Then D can 
be implemented to k bit precision using one evaluation of each fj. This can be 
done by using one of the techniques described in section |^ for each fj using the 
2x2 phase matrices 

1 

e 2 ™/ 2 ' 

Thus, an arbitrary diagonal matrix D can be approximated to e 27 ™/ 2 in O(k) 
steps plus the time it takes to compute each of the fj's. 



4 Decomposition 

A diagonal matrix D of size N — 2 n representing a phase change of an n qubit 
system can be implemented in 0(n) steps if it is decomposable into single-bit 
rotations on each of the n bits. In this section we give a test for decomposability 
of a matrix D with diagonal elements dj = Djj . As multiplying the entire state 
by a constant phase factor has no physical meaning, we may assume that do = 1 
without loss of generality. 

A diagonal matrix D is single-bit-decomposable if D = G n _i <£>... <8> Go 
where Gj are single bit phase shift gates of the form 



G k 



1 

g k 



Thus, the elements dj are of the form dj = n£~Q<?£, where r is the value of 
the /c-th bit of the binary expansion of j, if and only if D is decomposable. 
Equivalcntly, given the binary representation j = b n —i . . . 61&0 then 

d j = d&„-i...6ifco = 9n-i ■ ■ ■ 5No°- ( 14 ) 
In particular it follows that 

9k = d 2 k = D 2 k 2 k. (15) 

An effective way to test whether D is decomposable is to see whether the 
<7fc's given by Eq. ( |l5| ) satisfy Eq. (|l~4|). For arbitrary phase changes, this test 
is exponential in n, but in most practical cases the c?j's will be given by some 
function in terms of which the test can be performed efficiently. 

For any pair {x,x'} with x > x' that differ only in bit k of their binary 
representations, it follows from Eq. (14) and Eq. ( |l5| ) that d x /d x t = gu = d 2 k. 



This condition is necessary for decomposability, so can be used as a way to rule 
out matrices that are not decomposable. 
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5 Permutations 



In this section we will discuss efficient ways of implementing permutations. 
These transformations are often used in reordering states, so that subsequent 
operations can be efficiently implemented. For example, many diagonal ma- 
trices are decomposable when the states are ordered in some appropriate way. 
In contrast to more general unitary operations, permutations take each basis 
vector to another basis vector, rather than to a superposition of two or more 
basis vectors. 

Every permutation of the 2™ basis vectors of an n-bit quantum register 
corresponds to a classical computation on this register and vice versa. To see 
this note that a Toffoli gate (T) applied to any 3 bits of an arbitrary quantum 
state is a permutation on the basis vectors. Since T is complete for all classical 
computations, all classical computations are permutations. On the other hand, 
each permutation can be decomposed into a sequence of swaps each of which 
can be realized by a classical computation. 

We consider permutations that are described by a function g(x) of the form 
U g : \x, 0) — * \x, g(x)} with the requirement that both the permutation function, 
g(x), and its inverse g~ 1 {x) must be computable in polynomial time. These 
restrictions, which are stronger than those of previous sections, prevent the 
efficient implementation of permutations like the exchange of the desired state 
and the state |00 . . . 0) described in section fj~l| . 

The algorithm itself is simple. Every state computes its destination 

^2a x \x,0) -> ^a x \x,g(x)). (16) 

X X 

after which the g{x) bits erase the x bits. This last step can be accomplished 
using the exclusive-OR operation and the function g~ 1 (x'): 

^a x \x,g(x)) -> ^a x \x®g -\g{x)),g{x))=Y,a*%9(x))- (17) 

XX X 

If the position of the answer is relevant, the right and left parts of the register 
can always be exchanged by swapping individual qubits. 

The total computation time that this operation requires is just the time to 
compute g(x) plus the time to compute its inverse. 

Note that this process turns any classical bijection g of the form U g : |a;, 0) — > 
\x,g(x)) into an in-place computation of g of the form U g > : \x) — > \g(x)). 

6 Mixing Operations 

For effective quantum algorithms, we also need to be able to efficiently mix 
amplitudes in a superposition so as to increase the chance of a desired reading 
being made. One way to achieve this mixing is to combine an efficiently implc- 
mentable diagonal matrix with a decomposable mixing matrix. For instance, a 
number of existing algorithms [0, Il9j make use of mixing matrices of the form 



11 



WDW where D is a diagonal matrix and W is the Walsh-Hadamard transform 
given by 

w — 1 t i)\ x -y\ 

vv *y - 2 „/ 2 I- L ) 

We have described efficient implementations for certain diagonal matrices that 
can be combined with the Walsh-Hadamard transformation or other mixing 
matrices to achieve desireable amplitude interference. 

Another option for efficiently combining amplitudes, described in the re- 
mainder of the section, combines permutations with block-diagonal matrices to 
perform a different class of mixing operations. These mixing operations parti- 
tion the standard basis for quantum computation into small subsets, and mix 
amplitudes only between components in the same partition. 



6.1 Polynomial size block matrices 



An extension to the ideas presented so far is to consider matrices with a few 
off-diagonal elements. Specifically, we will talk about block diagonal matrices 
made out of equally sized k x k blocks {Bi}, 



M 



/B 



\ 



B l 



\ 



Bi-xJ 



(18) 



Many of the techniques used for implementing diagonal matrices can also 
be used for block matrices. The techniques are particularly useful when all the 
blocks have the same size k, because k must then be a power of two and the 
blocks act entirely on the lowest log 2 (fc) bits. Multiplying by M is equivalent 
to the higher bits choosing a unitary matrix to apply to the lower bits. This is 
the equivalent of states choosing a phase when multiplied by a diagonal matrix. 

In this section we will expand the technique discussed in section 2.1. In the 



diagonal case, we showed how an exponentially-sized diagonal matrix, could be 
implemented using a polynomial-sized diagonal matrix. The only restriction 
on the original matrix was that the number of different phases had to grow 
polynomially with the number of bits. 

For block diagonal case, we will do the same. We start with an exponentially- 
sized matrix M , and reduce it to a polynomial one. Instead of restricting the 
number of distinct phases, we restrict both the size of the blocks k and the 
number of distinct blocks a < j which make up M to be polynomial in n. The 
large matrix M, must also be described by a function f(x) that determines the 
locations of the blocks. If the distinct blocks are labeled with numbers from 
to a — 1, then f{x) assigns to each state the number of the block in M that 
would multiply it. Of course fix) must assign the same value for any two states 
that differ only by their lowest log 2 (fc) bits. 

With all the definitions in place we can compute 
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5> x m)^ £ 0*1*, /(*)>. (19) 

a; a: 

All that remains is to multiply this state by a polynomial-sized block diagonal 
matrix, which can done as follows. For each value c in the range of f(x), define 
g(y), for y = f(x) 7 to be 1 if y = c and otherwise. Then, we multiply the 
low bits of x by the matrix B y if and only if g{y) = 1. Knill JlTj shows that 
any quantum transformation on log 2 (fc) qubits can be implemented in at most 
0(k 2 log 2 (fc)) operations, so the total number of operations needed to perform 
all of these steps is 0(ak 2 log 2 (fc)). 

In the end, the bits containing f(x) must be erased. This can be done with 
another call to Uf. Hence, this algorithm requires two calls to Uf plus time 
0(afc 2 (log 2 (fc)), the time it takes to perform each of the a multiplications by 
the fc x k block matrices. 

Note that this technique is very similar to the "quantum direct sum" algo- 
rithm given by H0yer fli6fl . The main difference is that H0yer does not require 
a polynomial number of different blocks, although he hints that his method can 
be speeded up in certain cases along the lines we have described here. In return 
function / becomes f(x) = imodm where m is the size of each block, and so 
/ can be computed in-place without additional qubits. 

6.1.1 Combining Permutations and Blocks 

By combining permutations with block matrices, we can form more general 
mixing matrices. The idea is to divide the states into sets of k elements, called 
fc-sets, and then mix them according to some property of the fc-set. 

The first step reorders the states. We assign to each fc-set a unique number 
called a group number. We also assign to each state a number from to k — 1, 
called the member ID, that distinguishes it from the other states in its fc-set. 
Using 

g(x) — group -number (x) ■ k + member _I D(x), (20) 

we can apply the permutation x — > g(x) which will order the states with blocks 
corresponding to the fc-sets. 

The second step involves multiplication by a block diagonal matrix, M, made 
up of fc x k sized blocks. The choice of blocks in M given by f(x) will depend 
only on the group number of each fc-set. In this fashion, each fc-set can be mixed 
in different ways depending on its properties. 

The final step uses the permutation corresponding to g~ 1 (x) to send the 
states back to their original order. 

Note that this implementation is efficient only if k is polynomial in n and if 
g, g , and / are all efficiently computable. 
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7 Conclusions 



In this paper we have discussed a number of non-classical programming tech- 
niques for quantum computers. Several methods for implementing relative phase 
changes on components of an n-qubit state were described, as well as the trade- 
offs between these methods in terms of numbers of additional bits, number of 
calls to Uf, and the number of basic operations needed. Implementations of 
permutations and of block diagonal matrices were also described. Some of these 
techniques are more general than those used in currently known quantum algo- 
rithms. The hope is that they will aid in the development of future quantum 
algorithms. 
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